AITopics | fine-grained complexity

Collaborating Authors

fine-grained complexity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

Neural Information Processing SystemsNov-21-2025, 15:26:29 GMT

Empirical risk minimization (ERM) is ubiquitous in machine learning and underlies most supervised learning methods. While there is a large body of work on algorithms for various ERM problems, the exact computational complexity of ERM is still not understood. We address this issue for multiple popular ERM problems including kernel SVMs, kernel ridge regression, and training the final layer of a neural network. In particular, we give conditional hardness results for these problems based on complexity-theoretic assumptions such as the Strong Exponential Time Hypothesis. Under these assumptions, we show that there are no algorithms that solve the aforementioned ERM problems to high accuracy in sub-quadratic time. We also give similar hardness results for computing the gradient of the empirical loss, which is the main computational burden in many non-convex learning tasks.

empirical risk minimization, fine-grained complexity, kernel method and neural network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)

Add feedback

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

Neural Information Processing SystemsMay-27-2025, 04:59:24 GMT

Large language models (LLMs) have made fundamental contributions over the last a few years. To train an LLM, one needs to alternatingly run forward' computations and backward computations. The forward computation can be viewed as attention function evaluation, and the backward computation can be viewed as a gradient computation. In previous work by [Alman and Song, NeurIPS 2023], it was proved that the forward step can be performed in almost-linear time in certain parameter regimes, but that there is no truly sub-quadratic time algorithm in the remaining parameter regimes unless the popular hypothesis \mathsf{SETH} is false. In this work, we show nearly identical results for the harder-seeming problem of computing the gradient of loss function of one layer attention network, and thus for the entire process of LLM training.

computation, fine-grained complexity, gradient computation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reviews: On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

Neural Information Processing SystemsOct-7-2024, 22:52:35 GMT

The paper make use of (relatively) recent advances in complexity theory to show that of many common learning problems do not allow subquadratic time learning algorithms (given the veracity of the "Strong Exponential Time Hypothesis"). I appreciate that the authors do not oversell their results: They clearly state that they provide a worst-case analysis. Also, the results are not surprising. For instance, finding the exact solution of any kernel method requires the computation of the full kernel matrix, which is already quadratic in number of training examples. Reducing this computation time would imply that one can compute an approximation of the exact solution without computing the full kernel matrix, which is intuitively unlikely, unless he makes extra assumptions on the problem structure (i.e., the nature of the data-generating distribution).

empirical risk minimization, fine-grained complexity, kernel method and neural network, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.39)

Add feedback

Fundamental Limitations on Subquadratic Alternatives to Transformers

Alman, Josh, Yu, Hantao

arXiv.org Artificial IntelligenceOct-5-2024

The Transformer architecture is widely deployed in many popular and impactful Large Language Models. At its core is the attention mechanism for calculating correlations between pairs of tokens. Performing an attention computation takes quadratic time in the input size, and had become the time bottleneck for transformer operations. In order to circumvent this, researchers have used a variety of approaches, including designing heuristic algorithms for performing attention computations faster, and proposing alternatives to the attention mechanism which can be computed more quickly. For instance, state space models such as Mamba were designed to replace attention with an almost linear time alternative. In this paper, we prove that any such approach cannot perform important tasks that Transformer is able to perform (assuming a popular conjecture from fine-grained complexity theory). We focus on document similarity tasks, where one is given as input many documents and would like to find a pair which is (approximately) the most similar. We prove that Transformer is able to perform this task, and we prove that this task cannot be performed in truly subquadratic time by any algorithm. Thus, any model which can be evaluated in subquadratic time - whether because of subquadratic-time heuristics for attention, faster attention replacements like Mamba, or any other reason - cannot perform this task. In other words, in order to perform tasks that (implicitly or explicitly) involve document similarity, one may as well use Transformer and cannot avoid its quadratic running time.

algorithm, proceedings, transformer, (14 more...)

arXiv.org Artificial Intelligence

2410.04271

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Virginia (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.67)
Health & Medicine (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Fine-Grained Complexity and Algorithms for the Schulze Voting Method

Sornat, Krzysztof, Williams, Virginia Vassilevska, Xu, Yinzhan

arXiv.org Artificial IntelligenceMar-5-2021

We study computational aspects of a well-known single-winner voting rule called the Schulze method [Schulze, 2003] which is used broadly in practice. In this method the voters give (weak) ordinal preference ballots which are used to define the weighted majority graph (WMG) of direct comparisons between pairs of candidates. The choice of the winner comes from indirect comparisons in the graph, and more specifically from considering directed paths instead of direct comparisons between candidates. When the input is the WMG, to our knowledge, the fastest algorithm for computing all possible winners in the Schulze method uses a folklore reduction to the All-Pairs Bottleneck Paths (APBP) problem and runs in $O(m^{2.69})$ time, where $m$ is the number of candidates. It is an interesting open question whether this can be improved. Our first result is a combinatorial algorithm with a nearly quadratic running time for computing all possible winners. If the input to the possible winners problem is not the WMG but the preference profile, then constructing the WMG is a bottleneck that increases the running time significantly; in the special case when there are $O(m)$ voters and candidates, the running time becomes $O(m^{2.69})$, or $O(m^{2.5})$ if there is a nearly-linear time algorithm for multiplying dense square matrices. To address this bottleneck, we prove a formal equivalence between the well-studied Dominance Product problem and the problem of computing the WMG. We prove a similar connection between the so called Dominating Pairs problem and the problem of verifying whether a given candidate is a possible winner. Our paper is the first to bring fine-grained complexity into the field of computational social choice. Using it we can identify voting protocols that are unlikely to be practical for large numbers of candidates and/or voters, as their complexity is likely, say at least cubic.

algorithm, possible winner, weighted majority graph, (15 more...)

arXiv.org Artificial Intelligence

2103.03959

Country:

North America > United States > Virginia (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland (0.04)

Genre: Research Report (1.00)

Industry: Government > Voting & Elections (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

Backurs, Arturs, Indyk, Piotr, Schmidt, Ludwig

Neural Information Processing SystemsFeb-14-2020, 15:11:48 GMT

empirical risk minimization, fine-grained complexity, kernel method and neural network, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.40)

Add feedback

Finding the true potential of algorithms

#artificialintelligenceJan-29-2020, 21:33:22 GMT

Each semester, Associate Professor Virginia Vassilevska Williams tries to impart one fundamental lesson to her computer-science undergraduates: Math is the foundation of everything. Often, students come into Williams' class, 6.006 (Introduction to Algorithms), wanting to dive into advanced programming that power the latest, greatest computing techniques. Her lessons instead focus on how algorithms are designed around core mathematical models and concepts. "When taking an algorithms class, many students expect to program a lot and perhaps use deep learning, but it's very mathematical and has very little programming," says Williams, the Steven G. (1968) and Renee Finn Career Development Professor who recently earned tenure in the Department of Electrical Engineering and Computer Science. "We don't have much time together in class (only two hours a week), but I hope in that time they get to see a little of the beauty of math -- because math allows you to see how and why everything works together. It really is a beautiful thing."

algorithm, complexity, williams, (14 more...)

#artificialintelligence

Country:

North America > United States > Virginia (0.61)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
North America > United States > Wyoming > Albany County > Laramie (0.05)
(2 more...)

Industry:

Leisure & Entertainment > Sports (0.48)
Education > Educational Setting > Higher Education (0.30)

Technology:

Information Technology > Artificial Intelligence (0.50)
Information Technology > Communications > Social Media (0.40)

Add feedback